Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity
نویسنده
چکیده
Identifying similar and related words is not only key in natural language understanding but also a suitable task for assessing the quality of computational resources that organise words and meanings of a language, compiled by different means. This paper, which aims to be a reference for those interested in computing word similarity in Portuguese, presents several approaches for this task and is motivated by the recent availability of state-of-the-art distributional models of Portuguese words, which add to several lexical knowledge bases (LKBs) for this language, available for a longer time. The previous resources were exploited to answer word similarity tests, which also became recently available for Portuguese. We conclude that there are several valid approaches for this task, but not one that outperforms all the others in every single test. Distributional models seem to capture relatedness better, while LKBs are better suited for computing genuine similarity, but, in general, better results are obtained when knowledge from different sources is combined.
منابع مشابه
Measuring Semantic Similarity and Relatedness with Distributional and Knowledge-based Approaches
This paper provides a survey of different techniques for measuring semantic similarity and relatedness of word pairs. This covers both knowledge-based approaches exploiting taxonomies like WordNet, and corpus-based approaches which rely on distributional statistics. We introduce these techniques, provide evaluations of their result performance, and discuss their merits and shortcomings. A speci...
متن کاملDistributional Semantics beyond Concrete Concepts
In the last decade, corpus-based distributional models of semantic similarity and association have slipped into the mainstream of cognitive science and computational linguistics. On the basis of the contexts in which a word is used, they claim to capture certain aspects of word meaning and human semantic space organization. In computational linguistics, these models have been used to automatica...
متن کاملPredicting human similarity judgments with distributional models: The value of word associations
Most distributional lexico-semantic models derive their representations based on external language resources such as text corpora. In this study, we propose that internal language models, that are more closely aligned to the mental representations of words could provide important insights into cognitive science, including linguistics. Doing so allows us to reflect upon theoretical questions reg...
متن کاملWord meaning in context: a probabilistic model and its application to question answering
The need for assessing similarity in meaning is central to most language technology applications. Distributional methods are robust, unsupervised methods which achieve high performance on this task. These methods measure similarity of word types solely based on patterns of word occurrences in large corpora, following the intuition that similar words occur in similar contexts. As most Natural La...
متن کاملA Hybrid Distributional and Knowledge-based Model of Lexical Semantics
A range of approaches to the representation of lexical semantics have been explored within Computational Linguistics. Two of the most popular are distributional and knowledgebased models. This paper proposes hybrid models of lexical semantics that combine the advantages of these two approaches. Our models provide robust representations of synonymous words derived from WordNet. We also make use ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Information
دوره 9 شماره
صفحات -
تاریخ انتشار 2018